There is a solar power plant named “Kıvanç 2 Güneş Enerji Santrali” in Mersin, Turkey which is located between 36-37° north latitude and 33-35° east longitude.The aim of this project is to analyze the behavior of that power plant’s hourly solar electric production by looking at the past data and choosing an approach to predict the future productions. With respect to the ‘persistence approach’, the forecasts will be with lag 48, which corresponds to 2 days.
There are some variables that might affect the production rates:
TEMP: This is the temperature variable for this location. There are two impacts of temperature. The first one is, temperature can represent the seasonality. In a season, hourly temperature values will be similar. The second one is, the efficiency of solar plants decreases with higher temperatures due to the fact that high temperatures affect the solar panels.
REL_HUMIDITY: This value stands for relative humidity at the provided location. One can reach the information about the rainy or cloudy times by looking at this value. Rainy or cloudy times, which means relative humidity potentially decrease the production.
DSWRF: This is the short version of downward shortwave radiation flux which is known to be highly important for the production level.
CLOUD_LOW_LAYER: This is total cloud cover data (in terms of percentage) for low-level type of clouds which is also expected to affect the production rate.
By looking at the paired correlations between these variables and the production data, one can see the possible relations between them and that will give an idea for choosing the regressors when founding prediction models.
Uploading Necessary Libraries:
library(xlsx)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(ggplot2)
library(RcppRoll)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(skimr)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
library(reshape)
##
## Attaching package: 'reshape'
## The following object is masked from 'package:data.table':
##
## melt
## The following object is masked from 'package:dplyr':
##
## rename
## The following object is masked from 'package:lubridate':
##
## stamp
library(reshape2)
##
## Attaching package: 'reshape2'
## The following objects are masked from 'package:reshape':
##
## colsplit, melt, recast
## The following objects are masked from 'package:data.table':
##
## dcast, melt
library(readr)
library(caTools)
Data Manipulation:
long_weather <- data.table(read_csv("~/Desktop/project_data/long_weather.csv"))
## Rows: 403488 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): variable
## dbl (4): hour, lat, lon, value
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
production <- data.table(read_csv("~/Desktop/project_data/production.csv"))
## Rows: 10896 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): hour, production
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(long_weather)
## Classes 'data.table' and 'data.frame': 403488 obs. of 6 variables:
## $ date : Date, format: "2021-02-01" "2021-02-01" ...
## $ hour : num 0 1 2 3 4 5 6 7 8 9 ...
## $ lat : num 36.2 36.2 36.2 36.2 36.2 ...
## $ lon : num 33 33 33 33 33 33 33 33 33 33 ...
## $ variable: chr "DSWRF" "DSWRF" "DSWRF" "DSWRF" ...
## $ value : num 0 0 0 0 0 0 0 0 0 3 ...
## - attr(*, ".internal.selfref")=<externalptr>
wide_weather= dcast(long_weather, date+hour~lat+lon+variable)
data <- data.table(merge(wide_weather,production))
data[,AverageTEMP:=rowMeans(data[,c("36.75_33.5_TEMP","36.75_33.25_TEMP","36.75_33_TEMP","36.5_33.5_TEMP","36.5_33.25_TEMP","36.5_33_TEMP","36.25_33.5_TEMP","36.25_33.25_TEMP","36.25_33_TEMP")])]
data[,AverageREL_HUMIDITY:=rowMeans(data[,c("36.75_33.5_REL_HUMIDITY","36.75_33.25_REL_HUMIDITY","36.75_33_REL_HUMIDITY","36.5_33.5_REL_HUMIDITY","36.5_33.25_REL_HUMIDITY","36.5_33_REL_HUMIDITY","36.25_33.5_REL_HUMIDITY","36.25_33.25_REL_HUMIDITY","36.25_33_REL_HUMIDITY")])]
data[,AverageDSWRF:=rowMeans(data[,c("36.75_33.5_DSWRF","36.75_33.25_DSWRF","36.75_33_DSWRF","36.5_33.5_DSWRF","36.5_33.25_DSWRF","36.5_33_DSWRF","36.25_33.5_DSWRF","36.25_33.25_DSWRF","36.25_33_DSWRF")])]
data[,AverageCLOUD_LOW_LAYER:=rowMeans(data[,c("36.75_33.5_CLOUD_LOW_LAYER","36.75_33.25_CLOUD_LOW_LAYER","36.75_33_CLOUD_LOW_LAYER","36.5_33.5_CLOUD_LOW_LAYER","36.5_33.25_CLOUD_LOW_LAYER","36.5_33_CLOUD_LOW_LAYER","36.25_33.5_CLOUD_LOW_LAYER","36.25_33.25_CLOUD_LOW_LAYER","36.25_33_CLOUD_LOW_LAYER")])]
data <- data[order(hour,decreasing = F)]
data <- data[order(date,decreasing = F)]
data[, Year:=as.factor(year(date))]
data[,Month := as.factor(month(date))]
data[,Hour_factor := as.factor(hour)]
data[,max_in_month:=runmax(x=data$production, k=720, align = "left")]
data[,max_in_week:=runmax(x=data$production, k=168, align ="left")]
data[hour<=5|hour>=21,night:=1]
data[hour<21&hour>5,night:=0]
data$night <- as.factor(data$night)
data[,Lag1:=c(NA, data$production[1:(.N-1)])]
data[,Lag_week:=c(rep(NA,168), data$production[1:(.N-24*7)])]
data[,Lag_day:=c(rep(NA,24), data$production[1:(.N-24)])]
data[, Trend:=(1:.N)]
colnames(data) <- c("date","hour","CLOUD_LOW_LAYER_36.25_33","DSWRF_36.25_33","REL_HUMIDITY_36.25_33","TEMP_36.25_33","CLOUD_LOW_LAYER_36.25_33.25","DSWRF_36.25_33.25","REL_HUMIDITY_36.25_33.25","TEMP_36.25_33.25","CLOUD_LOW_LAYER_36.25_33.5","DSWRF_36.25_33.5","REL_HUMIDITY_36.25_33.5","TEMP_36.25_33.5","CLOUD_LOW_LAYER_36.5_33","DSWRF_36.5_33","REL_HUMIDITY_36.5_33","TEMP_36.5_33","CLOUD_LOW_LAYER_36.5_33.25","DSWRF_36.5_33.25","REL_HUMIDITY_36.5_33.25","TEMP_36.5_33.25","CLOUD_LOW_LAYER_36.5_33.5","DSWRF_36.5_33.5","REL_HUMIDITY_36.5_33.5","TEMP_36.5_33.5","CLOUD_LOW_LAYER_36.75_33","DSWRF_36.75_33","REL_HUMIDITY_36.75_33","TEMP_36.75_33","CLOUD_LOW_LAYER_36.75_33.25","DSWRF_36.75_33.25","REL_HUMIDITY_36.75_33.25","TEMP_36.75_33.25","CLOUD_LOW_LAYER_36.75_33.5","DSWRF_36.75_33.5","REL_HUMIDITY_36.75_33.5","TEMP_36.75_33.5","production","AverageTEMP","AverageREL_HUMIDITY","AverageDSWRF","AverageCLOUD_LOW_LAYER","Year","Month","Hour_factor","max_in_month","max_in_week","night","Lag1","Lag_week","Lag_day","Trend")
View(data)
Data Analysis:
Before going into models, the data is analyzed firstly. Since the data includes a very long time, only April and May 2022 are examined. As there can be seen below, production is 0 at nigh times and it reaches its maximum during midday hours. It does makes sense because electiricity production is affected directly by sunlight.
ggplot(subset(production,date >= "2022-04-01"),aes(x=date, y=production))+
geom_line()+geom_point()+ggtitle("Electricity Production April-May 2022")
Then, the data is plotted to see if there is a part of the data which should be removed before creating the model.
plot(data$date, data$production, type="line", main="Plot of the Data")
plot(ts(data$production,freq=24))
decomposed = decompose(ts(data$production,freq=24))
plot(decomposed)
ggplot(data[date=="2022-05-06"], aes(x=hour, y= production)) +
geom_line(color= "red") +
labs(title = "Hourly Electricity Production Data in 06/05/22 ",
x = "Hour",
y= "Production (MWh)")
The additive decomposition plots show that the data is not stationary currently, which show there’s some information in the data in order to create good predictions with it. The plots show clear hourly, yearly and dayly seasonality and trend; they should be dealt with.
Lastly, to investigate relations between averages of variables with production, correlation plot is created.
ggpairs(data,columns = c("AverageTEMP","AverageREL_HUMIDITY","AverageDSWRF","AverageCLOUD_LOW_LAYER","production"))
As there can be seen, there is a high and positive correlation between average DSWRF and production. After DSRWF, temperature has also a positive correlation with production.
acf(data$production)
pacf(data$production)
The autocorrelation function shows sinusoidal behavior while partial autocorrelation function show significance at the first two lags.
Models:
model1 <- lm(production~Trend, data)
summary(model1)
##
## Call:
## lm(formula = production ~ Trend, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.056 -10.489 -9.863 11.636 29.687
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.831e+00 2.731e-01 35.998 < 2e-16 ***
## Trend 1.125e-04 4.341e-05 2.591 0.00959 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.25 on 10894 degrees of freedom
## Multiple R-squared: 0.0006158, Adjusted R-squared: 0.000524
## F-statistic: 6.712 on 1 and 10894 DF, p-value: 0.009587
checkresiduals(model1)
##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: Residuals
## LM test = 9819.8, df = 10, p-value < 2.2e-16
AIC(model1)
## [1] 88824.38
BIC(model1)
## [1] 88846.26
model2 <- lm(production~Hour_factor, data)
summary(model2)
##
## Call:
## lm(formula = production ~ Hour_factor, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.6673 -0.6648 0.0000 1.9038 19.4437
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.163e-14 3.676e-01 0.000 1.0000
## Hour_factor1 3.126e-12 5.199e-01 0.000 1.0000
## Hour_factor2 7.462e-13 5.199e-01 0.000 1.0000
## Hour_factor3 3.537e-13 5.199e-01 0.000 1.0000
## Hour_factor4 -1.472e-12 5.199e-01 0.000 1.0000
## Hour_factor5 2.510e-02 5.199e-01 0.048 0.9615
## Hour_factor6 1.011e+00 5.199e-01 1.944 0.0519 .
## Hour_factor7 8.393e+00 5.199e-01 16.145 < 2e-16 ***
## Hour_factor8 1.997e+01 5.199e-01 38.423 < 2e-16 ***
## Hour_factor9 2.603e+01 5.199e-01 50.079 < 2e-16 ***
## Hour_factor10 2.767e+01 5.199e-01 53.219 < 2e-16 ***
## Hour_factor11 2.788e+01 5.199e-01 53.623 < 2e-16 ***
## Hour_factor12 2.783e+01 5.199e-01 53.541 < 2e-16 ***
## Hour_factor13 2.755e+01 5.199e-01 52.998 < 2e-16 ***
## Hour_factor14 2.636e+01 5.199e-01 50.712 < 2e-16 ***
## Hour_factor15 2.441e+01 5.199e-01 46.945 < 2e-16 ***
## Hour_factor16 1.977e+01 5.199e-01 38.035 < 2e-16 ***
## Hour_factor17 1.049e+01 5.199e-01 20.179 < 2e-16 ***
## Hour_factor18 2.940e+00 5.199e-01 5.654 1.6e-08 ***
## Hour_factor19 2.964e-01 5.199e-01 0.570 0.5687
## Hour_factor20 1.370e-03 5.199e-01 0.003 0.9979
## Hour_factor21 6.235e-15 5.199e-01 0.000 1.0000
## Hour_factor22 5.342e-15 5.199e-01 0.000 1.0000
## Hour_factor23 3.869e-15 5.199e-01 0.000 1.0000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.833 on 10872 degrees of freedom
## Multiple R-squared: 0.6987, Adjusted R-squared: 0.6981
## F-statistic: 1096 on 23 and 10872 DF, p-value: < 2.2e-16
checkresiduals(model2)
##
## Breusch-Godfrey test for serial correlation of order up to 27
##
## data: Residuals
## LM test = 9084.3, df = 27, p-value < 2.2e-16
AIC(model2)
## [1] 75802.03
BIC(model2)
## [1] 75984.44
model3 <- lm(production~Hour_factor+Month, data)
summary(model3)
##
## Call:
## lm(formula = production ~ Hour_factor + Month, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.3370 -4.2010 0.3091 4.8116 16.1838
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.146e+00 4.082e-01 -10.155 < 2e-16 ***
## Hour_factor1 3.100e-12 4.549e-01 0.000 1.00000
## Hour_factor2 7.624e-13 4.549e-01 0.000 1.00000
## Hour_factor3 3.225e-13 4.549e-01 0.000 1.00000
## Hour_factor4 -1.562e-12 4.549e-01 0.000 1.00000
## Hour_factor5 2.510e-02 4.549e-01 0.055 0.95600
## Hour_factor6 1.011e+00 4.549e-01 2.222 0.02629 *
## Hour_factor7 8.393e+00 4.549e-01 18.451 < 2e-16 ***
## Hour_factor8 1.997e+01 4.549e-01 43.910 < 2e-16 ***
## Hour_factor9 2.603e+01 4.549e-01 57.230 < 2e-16 ***
## Hour_factor10 2.767e+01 4.549e-01 60.819 < 2e-16 ***
## Hour_factor11 2.788e+01 4.549e-01 61.281 < 2e-16 ***
## Hour_factor12 2.783e+01 4.549e-01 61.187 < 2e-16 ***
## Hour_factor13 2.755e+01 4.549e-01 60.566 < 2e-16 ***
## Hour_factor14 2.636e+01 4.549e-01 57.954 < 2e-16 ***
## Hour_factor15 2.441e+01 4.549e-01 53.649 < 2e-16 ***
## Hour_factor16 1.977e+01 4.549e-01 43.466 < 2e-16 ***
## Hour_factor17 1.049e+01 4.549e-01 23.061 < 2e-16 ***
## Hour_factor18 2.940e+00 4.549e-01 6.462 1.08e-10 ***
## Hour_factor19 2.964e-01 4.549e-01 0.651 0.51477
## Hour_factor20 1.370e-03 4.549e-01 0.003 0.99760
## Hour_factor21 -4.633e-14 4.549e-01 0.000 1.00000
## Hour_factor22 -4.054e-14 4.549e-01 0.000 1.00000
## Hour_factor23 -3.691e-14 4.549e-01 0.000 1.00000
## Month2 -6.661e-01 3.211e-01 -2.075 0.03804 *
## Month3 1.668e+00 3.147e-01 5.301 1.18e-07 ***
## Month4 4.470e+00 3.164e-01 14.128 < 2e-16 ***
## Month5 6.221e+00 3.470e-01 17.928 < 2e-16 ***
## Month6 8.983e+00 3.643e-01 24.655 < 2e-16 ***
## Month7 1.008e+01 3.614e-01 27.880 < 2e-16 ***
## Month8 9.679e+00 3.707e-01 26.111 < 2e-16 ***
## Month9 8.080e+00 3.643e-01 22.176 < 2e-16 ***
## Month10 6.163e+00 3.614e-01 17.051 < 2e-16 ***
## Month11 2.172e+00 3.643e-01 5.963 2.56e-09 ***
## Month12 -1.116e+00 3.614e-01 -3.089 0.00201 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.854 on 10861 degrees of freedom
## Multiple R-squared: 0.7696, Adjusted R-squared: 0.7688
## F-statistic: 1067 on 34 and 10861 DF, p-value: < 2.2e-16
checkresiduals(model3)
##
## Breusch-Godfrey test for serial correlation of order up to 38
##
## data: Residuals
## LM test = 8553.8, df = 38, p-value < 2.2e-16
AIC(model3)
## [1] 72904.12
BIC(model3)
## [1] 73166.78
model4 <- lm(production~Hour_factor+Month+Year, data)
summary(model4)
##
## Call:
## lm(formula = production ~ Hour_factor + Month + Year, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.3370 -4.2304 0.4865 4.6429 15.8822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.477e+00 4.443e-01 -19.079 < 2e-16 ***
## Hour_factor1 3.001e-12 4.449e-01 0.000 1.0000
## Hour_factor2 7.498e-13 4.449e-01 0.000 1.0000
## Hour_factor3 3.503e-13 4.449e-01 0.000 1.0000
## Hour_factor4 -1.583e-12 4.449e-01 0.000 1.0000
## Hour_factor5 2.510e-02 4.449e-01 0.056 0.9550
## Hour_factor6 1.011e+00 4.449e-01 2.272 0.0231 *
## Hour_factor7 8.393e+00 4.449e-01 18.865 < 2e-16 ***
## Hour_factor8 1.997e+01 4.449e-01 44.895 < 2e-16 ***
## Hour_factor9 2.603e+01 4.449e-01 58.514 < 2e-16 ***
## Hour_factor10 2.767e+01 4.449e-01 62.184 < 2e-16 ***
## Hour_factor11 2.788e+01 4.449e-01 62.655 < 2e-16 ***
## Hour_factor12 2.783e+01 4.449e-01 62.560 < 2e-16 ***
## Hour_factor13 2.755e+01 4.449e-01 61.925 < 2e-16 ***
## Hour_factor14 2.636e+01 4.449e-01 59.254 < 2e-16 ***
## Hour_factor15 2.441e+01 4.449e-01 54.853 < 2e-16 ***
## Hour_factor16 1.977e+01 4.449e-01 44.441 < 2e-16 ***
## Hour_factor17 1.049e+01 4.449e-01 23.578 < 2e-16 ***
## Hour_factor18 2.940e+00 4.449e-01 6.607 4.11e-11 ***
## Hour_factor19 2.964e-01 4.449e-01 0.666 0.5054
## Hour_factor20 1.370e-03 4.449e-01 0.003 0.9975
## Hour_factor21 -6.926e-14 4.449e-01 0.000 1.0000
## Hour_factor22 -6.902e-14 4.449e-01 0.000 1.0000
## Hour_factor23 -4.490e-14 4.449e-01 0.000 1.0000
## Month2 1.460e+00 3.283e-01 4.448 8.73e-06 ***
## Month3 3.834e+00 3.229e-01 11.874 < 2e-16 ***
## Month4 6.636e+00 3.245e-01 20.453 < 2e-16 ***
## Month5 9.850e+00 3.766e-01 26.153 < 2e-16 ***
## Month6 1.331e+01 4.062e-01 32.779 < 2e-16 ***
## Month7 1.441e+01 4.037e-01 35.692 < 2e-16 ***
## Month8 1.401e+01 4.116e-01 34.037 < 2e-16 ***
## Month9 1.241e+01 4.062e-01 30.556 < 2e-16 ***
## Month10 1.049e+01 4.037e-01 25.996 < 2e-16 ***
## Month11 6.504e+00 4.062e-01 16.013 < 2e-16 ***
## Month12 3.215e+00 4.037e-01 7.964 1.83e-15 ***
## Year2022 4.332e+00 1.949e-01 22.221 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.704 on 10860 degrees of freedom
## Multiple R-squared: 0.7796, Adjusted R-squared: 0.7789
## F-statistic: 1097 on 35 and 10860 DF, p-value: < 2.2e-16
checkresiduals(model4)
##
## Breusch-Godfrey test for serial correlation of order up to 39
##
## data: Residuals
## LM test = 8458.9, df = 39, p-value < 2.2e-16
AIC(model4)
## [1] 72421.66
BIC(model4)
## [1] 72691.62
model5 <- lm(production~Hour_factor+Month+Year+Trend, data)
summary(model5)
##
## Call:
## lm(formula = production ~ Hour_factor + Month + Year + Trend,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.7113 -4.2507 0.6076 4.7898 15.4562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.0584033 0.4568902 -17.638 < 2e-16 ***
## Hour_factor1 -0.0011996 0.4446398 -0.003 0.997847
## Hour_factor2 -0.0023992 0.4446401 -0.005 0.995695
## Hour_factor3 -0.0035989 0.4446406 -0.008 0.993542
## Hour_factor4 -0.0047985 0.4446414 -0.011 0.991390
## Hour_factor5 0.0191031 0.4446423 0.043 0.965732
## Hour_factor6 1.0036926 0.4446435 2.257 0.024009 *
## Hour_factor7 8.3850900 0.4446449 18.858 < 2e-16 ***
## Hour_factor8 19.9653318 0.4446465 44.902 < 2e-16 ***
## Hour_factor9 26.0236963 0.4446483 58.526 < 2e-16 ***
## Hour_factor10 27.6553305 0.4446504 62.196 < 2e-16 ***
## Hour_factor11 27.8640225 0.4446526 62.665 < 2e-16 ***
## Hour_factor12 27.8203512 0.4446551 62.566 < 2e-16 ***
## Hour_factor13 27.5364941 0.4446577 61.927 < 2e-16 ***
## Hour_factor14 26.3471388 0.4446606 59.252 < 2e-16 ***
## Hour_factor15 24.3876844 0.4446637 54.845 < 2e-16 ***
## Hour_factor16 19.7540949 0.4446670 44.424 < 2e-16 ***
## Hour_factor17 10.4703401 0.4446706 23.546 < 2e-16 ***
## Hour_factor18 2.9179594 0.4446743 6.562 5.55e-11 ***
## Hour_factor19 0.2735587 0.4446783 0.615 0.538447
## Hour_factor20 -0.0226223 0.4446824 -0.051 0.959428
## Hour_factor21 -0.0251920 0.4446868 -0.057 0.954824
## Hour_factor22 -0.0263916 0.4446914 -0.059 0.952676
## Hour_factor23 -0.0275912 0.4446962 -0.062 0.950528
## Month2 0.6527270 0.3882210 1.681 0.092728 .
## Month3 2.1846903 0.5328047 4.100 4.16e-05 ***
## Month4 4.1085771 0.7260929 5.658 1.57e-08 ***
## Month5 6.5163564 0.9358607 6.963 3.52e-12 ***
## Month6 9.0506697 1.1686062 7.745 1.04e-14 ***
## Month7 9.2668571 1.3817534 6.707 2.09e-11 ***
## Month8 8.0194716 1.5938497 5.032 4.94e-07 ***
## Month9 5.5853457 1.8007975 3.102 0.001930 **
## Month10 2.7904970 2.0208349 1.381 0.167349
## Month11 -2.0781295 2.2428956 -0.927 0.354187
## Month12 -6.2450931 2.4648022 -2.534 0.011300 *
## Year2022 -5.9923372 2.6607068 -2.252 0.024332 *
## Trend 0.0011996 0.0003083 3.891 0.000101 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.699 on 10859 degrees of freedom
## Multiple R-squared: 0.7799, Adjusted R-squared: 0.7792
## F-statistic: 1069 on 36 and 10859 DF, p-value: < 2.2e-16
checkresiduals(model5)
##
## Breusch-Godfrey test for serial correlation of order up to 40
##
## data: Residuals
## LM test = 8457.4, df = 40, p-value < 2.2e-16
AIC(model5)
## [1] 72408.48
BIC(model5)
## [1] 72685.73
Dummy variables for month, year, hour and trend component each make a contribution to the model, making residuals closer to normal with zero mean and constant variable assumptions and each increase adjusted R_squared.
model6 <- lm(production~Hour_factor+Month+Year+Trend+Lag1, data)
summary(model6)
##
## Call:
## lm(formula = production ~ Hour_factor + Month + Year + Trend +
## Lag1, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.651 -0.984 0.103 1.186 20.969
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.4417670 0.2644242 -5.452 5.08e-08 ***
## Hour_factor1 0.0027064 0.2537093 0.011 0.9915
## Hour_factor2 0.0024913 0.2537094 0.010 0.9922
## Hour_factor3 0.0022762 0.2537096 0.009 0.9928
## Hour_factor4 0.0020612 0.2537100 0.008 0.9935
## Hour_factor5 0.0269472 0.2537104 0.106 0.9154
## Hour_factor6 0.9919016 0.2537111 3.910 9.30e-05 ***
## Hour_factor7 7.5644930 0.2537730 29.808 < 2e-16 ***
## Hour_factor8 13.0811799 0.2578442 50.733 < 2e-16 ***
## Hour_factor9 9.6267890 0.2762756 34.845 < 2e-16 ***
## Hour_factor10 6.2816926 0.2909976 21.587 < 2e-16 ***
## Hour_factor11 5.1500545 0.2954743 17.430 < 2e-16 ***
## Hour_factor12 4.9349493 0.2960624 16.669 < 2e-16 ***
## Hour_factor13 4.6869658 0.2959405 15.838 < 2e-16 ***
## Hour_factor14 3.7307879 0.2951452 12.641 < 2e-16 ***
## Hour_factor15 2.7483455 0.2918747 9.416 < 2e-16 ***
## Hour_factor16 -0.2756233 0.2867269 -0.961 0.3364
## Hour_factor17 -5.7530509 0.2758310 -20.857 < 2e-16 ***
## Hour_factor18 -5.6791599 0.2601361 -21.831 < 2e-16 ***
## Hour_factor19 -2.1195510 0.2542369 -8.337 < 2e-16 ***
## Hour_factor20 -0.2434522 0.2537369 -0.959 0.3373
## Hour_factor21 -0.0027207 0.2537344 -0.011 0.9914
## Hour_factor22 -0.0018103 0.2537369 -0.007 0.9943
## Hour_factor23 -0.0020254 0.2537396 -0.008 0.9936
## Month2 0.1151993 0.2214565 0.520 0.6029
## Month3 0.3890823 0.3040947 1.279 0.2008
## Month4 0.7319002 0.4147105 1.765 0.0776 .
## Month5 1.1613577 0.5349216 2.171 0.0299 *
## Month6 1.6132120 0.6683094 2.414 0.0158 *
## Month7 1.6511445 0.7896689 2.091 0.0366 *
## Month8 1.4277987 0.9100663 1.569 0.1167
## Month9 0.9925881 1.0274957 0.966 0.3341
## Month10 0.4929430 1.1526415 0.428 0.6689
## Month11 -0.3769457 1.2792457 -0.295 0.7683
## Month12 -1.1215624 1.4061773 -0.798 0.4251
## Year2022 -1.0770902 1.5178519 -0.710 0.4780
## Trend 0.0002151 0.0001760 1.222 0.2217
## Lag1 0.8214642 0.0054729 150.097 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.82 on 10857 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9284, Adjusted R-squared: 0.9282
## F-statistic: 3806 on 37 and 10857 DF, p-value: < 2.2e-16
checkresiduals(model6)
##
## Breusch-Godfrey test for serial correlation of order up to 41
##
## data: Residuals
## LM test = 3401.6, df = 41, p-value < 2.2e-16
AIC(model6)
## [1] 60164.98
BIC(model6)
## [1] 60449.52
model7 <- lm(production~Hour_factor+Month+Year+Trend+Lag1+Lag_day, data)
summary(model7)
##
## Call:
## lm(formula = production ~ Hour_factor + Month + Year + Trend +
## Lag1 + Lag_day, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.705 -0.427 0.032 0.641 19.776
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.846e-01 2.479e-01 -2.359 0.01836 *
## Hour_factor1 7.149e-05 2.370e-01 0.000 0.99976
## Hour_factor2 1.430e-04 2.370e-01 0.001 0.99952
## Hour_factor3 2.145e-04 2.370e-01 0.001 0.99928
## Hour_factor4 2.860e-04 2.370e-01 0.001 0.99904
## Hour_factor5 1.950e-02 2.370e-01 0.082 0.93442
## Hour_factor6 7.555e-01 2.371e-01 3.186 0.00144 **
## Hour_factor7 5.702e+00 2.417e-01 23.591 < 2e-16 ***
## Hour_factor8 9.391e+00 2.583e-01 36.351 < 2e-16 ***
## Hour_factor9 5.947e+00 2.745e-01 21.664 < 2e-16 ***
## Hour_factor10 2.978e+00 2.846e-01 10.462 < 2e-16 ***
## Hour_factor11 2.000e+00 2.876e-01 6.956 3.71e-12 ***
## Hour_factor12 1.826e+00 2.878e-01 6.343 2.34e-10 ***
## Hour_factor13 1.638e+00 2.873e-01 5.700 1.23e-08 ***
## Hour_factor14 9.225e-01 2.850e-01 3.237 0.00121 **
## Hour_factor15 2.520e-01 2.801e-01 0.899 0.36842
## Hour_factor16 -1.921e+00 2.713e-01 -7.081 1.52e-12 ***
## Hour_factor17 -5.753e+00 2.578e-01 -22.316 < 2e-16 ***
## Hour_factor18 -5.052e+00 2.436e-01 -20.740 < 2e-16 ***
## Hour_factor19 -1.817e+00 2.376e-01 -7.646 2.24e-14 ***
## Hour_factor20 -2.036e-01 2.370e-01 -0.859 0.39036
## Hour_factor21 5.485e-04 2.370e-01 0.002 0.99815
## Hour_factor22 1.573e-03 2.370e-01 0.007 0.99471
## Hour_factor23 1.644e-03 2.370e-01 0.007 0.99447
## Month2 1.792e-01 2.075e-01 0.863 0.38797
## Month3 3.683e-01 2.842e-01 1.296 0.19500
## Month4 5.920e-01 3.878e-01 1.526 0.12692
## Month5 8.864e-01 5.003e-01 1.772 0.07645 .
## Month6 1.151e+00 6.251e-01 1.842 0.06557 .
## Month7 1.247e+00 7.387e-01 1.687 0.09156 .
## Month8 1.263e+00 8.514e-01 1.484 0.13783
## Month9 1.187e+00 9.613e-01 1.235 0.21703
## Month10 1.103e+00 1.079e+00 1.022 0.30665
## Month11 9.576e-01 1.197e+00 0.800 0.42385
## Month12 7.195e-01 1.316e+00 0.547 0.58472
## Year2022 8.960e-01 1.421e+00 0.630 0.52841
## Trend -7.149e-05 1.649e-04 -0.434 0.66460
## Lag1 6.939e-01 6.032e-03 115.043 < 2e-16 ***
## Lag_day 2.402e-01 6.022e-03 39.891 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.567 on 10833 degrees of freedom
## (24 observations deleted due to missingness)
## Multiple R-squared: 0.9377, Adjusted R-squared: 0.9375
## F-statistic: 4290 on 38 and 10833 DF, p-value: < 2.2e-16
checkresiduals(model7)
##
## Breusch-Godfrey test for serial correlation of order up to 42
##
## data: Residuals
## LM test = 2201.5, df = 42, p-value < 2.2e-16
AIC(model7)
## [1] 58546.93
BIC(model7)
## [1] 58838.68
model8 <- lm(production~Hour_factor+Month+Year+Trend+Lag1+Lag_week+Lag_day, data)
summary(model8)
##
## Call:
## lm(formula = production ~ Hour_factor + Month + Year + Trend +
## Lag1 + Lag_week + Lag_day, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.939 -0.398 0.001 0.745 20.361
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1752621 0.2459746 -0.713 0.47616
## Hour_factor1 0.0002648 0.2347595 0.001 0.99910
## Hour_factor2 0.0005297 0.2347597 0.002 0.99820
## Hour_factor3 0.0007945 0.2347600 0.003 0.99730
## Hour_factor4 0.0010594 0.2347604 0.005 0.99640
## Hour_factor5 0.0187726 0.2347609 0.080 0.93627
## Hour_factor6 0.6936325 0.2348648 2.953 0.00315 **
## Hour_factor7 5.1798796 0.2414369 21.454 < 2e-16 ***
## Hour_factor8 8.2417698 0.2643670 31.175 < 2e-16 ***
## Hour_factor9 4.6136925 0.2819794 16.362 < 2e-16 ***
## Hour_factor10 1.6873894 0.2914684 5.789 7.27e-09 ***
## Hour_factor11 0.7371912 0.2940021 2.507 0.01218 *
## Hour_factor12 0.5507862 0.2942277 1.872 0.06124 .
## Hour_factor13 0.3934491 0.2934229 1.341 0.17998
## Hour_factor14 -0.2633223 0.2902959 -0.907 0.36438
## Hour_factor15 -0.8156499 0.2842695 -2.869 0.00412 **
## Hour_factor16 -2.7035864 0.2727440 -9.913 < 2e-16 ***
## Hour_factor17 -5.9976207 0.2562296 -23.407 < 2e-16 ***
## Hour_factor18 -5.0067617 0.2414939 -20.732 < 2e-16 ***
## Hour_factor19 -1.7705045 0.2354164 -7.521 5.88e-14 ***
## Hour_factor20 -0.1935733 0.2347899 -0.824 0.40970
## Hour_factor21 0.0046380 0.2347848 0.020 0.98424
## Hour_factor22 0.0058266 0.2347873 0.025 0.98020
## Hour_factor23 0.0060914 0.2347899 0.026 0.97930
## Month2 0.2632169 0.2085440 1.262 0.20692
## Month3 0.4717311 0.2807094 1.680 0.09289 .
## Month4 0.6799372 0.3839411 1.771 0.07660 .
## Month5 0.9130761 0.4949861 1.845 0.06512 .
## Month6 1.1872906 0.6188888 1.918 0.05508 .
## Month7 1.2894638 0.7319553 1.762 0.07815 .
## Month8 1.4678309 0.8440602 1.739 0.08206 .
## Month9 1.5600370 0.9535314 1.636 0.10186
## Month10 1.6357000 1.0702943 1.528 0.12647
## Month11 1.8647378 1.1889689 1.568 0.11683
## Month12 1.8984348 1.3079124 1.451 0.14667
## Year2022 2.3104517 1.4127153 1.635 0.10198
## Trend -0.0002648 0.0001643 -1.612 0.10702
## Lag1 0.6638716 0.0061568 107.827 < 2e-16 ***
## Lag_week 0.1235608 0.0059751 20.679 < 2e-16 ***
## Lag_day 0.1937060 0.0063178 30.660 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.51 on 10688 degrees of freedom
## (168 observations deleted due to missingness)
## Multiple R-squared: 0.9402, Adjusted R-squared: 0.94
## F-statistic: 4307 on 39 and 10688 DF, p-value: < 2.2e-16
checkresiduals(model8)
##
## Breusch-Godfrey test for serial correlation of order up to 43
##
## data: Residuals
## LM test = 2133.4, df = 43, p-value < 2.2e-16
AIC(model8)
## [1] 57424.93
BIC(model8)
## [1] 57723.44
The autoregressive lags of one hour, one week and one day each make a contribution to the model and increase the adjusted R-squared value.
model9 <- lm(production~.-AverageTEMP-AverageREL_HUMIDITY-AverageDSWRF-AverageCLOUD_LOW_LAYER, data)
summary(model9)
##
## Call:
## lm(formula = production ~ . - AverageTEMP - AverageREL_HUMIDITY -
## AverageDSWRF - AverageCLOUD_LOW_LAYER, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.304 -0.708 -0.061 1.113 20.389
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.365e+02 1.554e+03 0.602 0.546855
## date -4.993e-02 8.326e-02 -0.600 0.548711
## hour 4.026e-04 1.063e-02 0.038 0.969792
## CLOUD_LOW_LAYER_36.25_33 -2.478e-03 3.100e-03 -0.800 0.424016
## DSWRF_36.25_33 -2.122e-03 1.603e-03 -1.324 0.185628
## REL_HUMIDITY_36.25_33 1.134e-02 9.506e-03 1.193 0.233052
## TEMP_36.25_33 1.431e-01 7.327e-02 1.953 0.050800 .
## CLOUD_LOW_LAYER_36.25_33.25 -1.114e-03 4.212e-03 -0.264 0.791439
## DSWRF_36.25_33.25 1.745e-03 2.146e-03 0.813 0.416046
## REL_HUMIDITY_36.25_33.25 -1.880e-02 1.414e-02 -1.329 0.183842
## TEMP_36.25_33.25 1.352e-01 1.049e-01 1.289 0.197430
## CLOUD_LOW_LAYER_36.25_33.5 -1.616e-03 3.470e-03 -0.466 0.641576
## DSWRF_36.25_33.5 -1.301e-03 1.716e-03 -0.758 0.448369
## REL_HUMIDITY_36.25_33.5 8.999e-03 9.861e-03 0.913 0.361458
## TEMP_36.25_33.5 -1.098e-01 7.020e-02 -1.564 0.117846
## CLOUD_LOW_LAYER_36.5_33 -4.014e-03 3.089e-03 -1.299 0.193899
## DSWRF_36.5_33 7.490e-03 1.539e-03 4.869 1.14e-06 ***
## REL_HUMIDITY_36.5_33 -3.216e-04 9.048e-03 -0.036 0.971649
## TEMP_36.5_33 -1.502e-01 5.770e-02 -2.603 0.009264 **
## CLOUD_LOW_LAYER_36.5_33.25 -2.968e-03 3.997e-03 -0.743 0.457717
## DSWRF_36.5_33.25 -5.485e-03 1.926e-03 -2.847 0.004421 **
## REL_HUMIDITY_36.5_33.25 -1.939e-02 1.365e-02 -1.420 0.155684
## TEMP_36.5_33.25 -6.633e-02 9.590e-02 -0.692 0.489194
## CLOUD_LOW_LAYER_36.5_33.5 -9.666e-03 3.804e-03 -2.541 0.011064 *
## DSWRF_36.5_33.5 2.852e-03 1.813e-03 1.574 0.115622
## REL_HUMIDITY_36.5_33.5 6.588e-03 1.132e-02 0.582 0.560635
## TEMP_36.5_33.5 -3.248e-02 6.162e-02 -0.527 0.598164
## CLOUD_LOW_LAYER_36.75_33 1.174e-03 2.637e-03 0.445 0.656200
## DSWRF_36.75_33 6.920e-05 1.306e-03 0.053 0.957752
## REL_HUMIDITY_36.75_33 1.938e-02 8.456e-03 2.292 0.021919 *
## TEMP_36.75_33 1.888e-01 6.374e-02 2.963 0.003058 **
## CLOUD_LOW_LAYER_36.75_33.25 -2.256e-03 4.249e-03 -0.531 0.595360
## DSWRF_36.75_33.25 -7.001e-04 1.993e-03 -0.351 0.725370
## REL_HUMIDITY_36.75_33.25 -2.921e-02 1.182e-02 -2.472 0.013459 *
## TEMP_36.75_33.25 -2.784e-01 8.457e-02 -3.292 0.000998 ***
## CLOUD_LOW_LAYER_36.75_33.5 3.943e-04 3.406e-03 0.116 0.907844
## DSWRF_36.75_33.5 -3.354e-03 1.566e-03 -2.142 0.032217 *
## REL_HUMIDITY_36.75_33.5 2.688e-02 7.922e-03 3.393 0.000693 ***
## TEMP_36.75_33.5 1.507e-01 6.747e-02 2.233 0.025544 *
## Year2022 3.951e+00 1.419e+00 2.784 0.005381 **
## Month2 4.089e-01 2.256e-01 1.812 0.069958 .
## Month3 1.010e+00 2.913e-01 3.467 0.000529 ***
## Month4 1.476e+00 4.017e-01 3.674 0.000240 ***
## Month5 1.605e+00 5.193e-01 3.090 0.002007 **
## Month6 1.931e+00 6.516e-01 2.963 0.003053 **
## Month7 2.210e+00 7.745e-01 2.853 0.004334 **
## Month8 2.454e+00 8.758e-01 2.803 0.005078 **
## Month9 2.613e+00 9.739e-01 2.683 0.007305 **
## Month10 2.569e+00 1.078e+00 2.382 0.017227 *
## Month11 3.000e+00 1.195e+00 2.511 0.012042 *
## Month12 3.521e+00 1.313e+00 2.681 0.007346 **
## Hour_factor1 -3.421e-02 2.262e-01 -0.151 0.879784
## Hour_factor2 -7.924e-02 2.219e-01 -0.357 0.721068
## Hour_factor3 -1.054e-01 2.182e-01 -0.483 0.629014
## Hour_factor4 -1.359e-01 2.152e-01 -0.631 0.527730
## Hour_factor5 -1.512e-01 2.126e-01 -0.711 0.476857
## Hour_factor6 4.846e-01 2.105e-01 2.302 0.021363 *
## Hour_factor7 4.897e+00 2.151e-01 22.763 < 2e-16 ***
## Hour_factor8 7.933e+00 2.394e-01 33.130 < 2e-16 ***
## Hour_factor9 4.475e+00 2.615e-01 17.115 < 2e-16 ***
## Hour_factor10 1.999e+00 2.961e-01 6.750 1.56e-11 ***
## Hour_factor11 1.246e+00 3.082e-01 4.043 5.32e-05 ***
## Hour_factor12 1.281e+00 3.176e-01 4.035 5.51e-05 ***
## Hour_factor13 1.350e+00 3.235e-01 4.172 3.05e-05 ***
## Hour_factor14 8.837e-01 3.249e-01 2.720 0.006544 **
## Hour_factor15 4.529e-01 3.218e-01 1.407 0.159374
## Hour_factor16 -1.367e+00 2.938e-01 -4.654 3.29e-06 ***
## Hour_factor17 -4.685e+00 2.716e-01 -17.247 < 2e-16 ***
## Hour_factor18 -3.869e+00 2.555e-01 -15.145 < 2e-16 ***
## Hour_factor19 -9.002e-01 2.461e-01 -3.658 0.000256 ***
## Hour_factor20 4.363e-01 2.400e-01 1.818 0.069042 .
## Hour_factor21 4.127e-01 2.367e-01 1.744 0.081253 .
## Hour_factor22 6.445e-02 2.264e-01 0.285 0.775859
## Hour_factor23 NA NA NA NA
## max_in_month -3.439e-03 2.795e-02 -0.123 0.902069
## max_in_week 4.343e-02 2.547e-02 1.705 0.088189 .
## night1 NA NA NA NA
## Lag1 6.489e-01 7.012e-03 92.543 < 2e-16 ***
## Lag_week 1.373e-01 6.033e-03 22.764 < 2e-16 ***
## Lag_day 1.958e-01 6.304e-03 31.059 < 2e-16 ***
## Trend 1.585e-03 3.519e-03 0.450 0.652511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.451 on 10649 degrees of freedom
## (168 observations deleted due to missingness)
## Multiple R-squared: 0.9424, Adjusted R-squared: 0.9419
## F-statistic: 2232 on 78 and 10649 DF, p-value: < 2.2e-16
checkresiduals(model9)
##
## Breusch-Godfrey test for serial correlation of order up to 84
##
## data: Residuals
## LM test = 2541.7, df = 84, p-value < 2.2e-16
AIC(model9)
## [1] 57104.58
BIC(model9)
## [1] 57687.02
model10 <- lm(production~.-AverageTEMP-AverageREL_HUMIDITY-AverageDSWRF-AverageCLOUD_LOW_LAYER-DSWRF_36.25_33.5 -CLOUD_LOW_LAYER_36.25_33-CLOUD_LOW_LAYER_36.25_33.25-CLOUD_LOW_LAYER_36.25_33.5-REL_HUMIDITY_36.5_33-REL_HUMIDITY_36.5_33.5-TEMP_36.5_33.5 -CLOUD_LOW_LAYER_36.75_33 -DSWRF_36.75_33 -CLOUD_LOW_LAYER_36.75_33.25 -DSWRF_36.75_33.25-CLOUD_LOW_LAYER_36.75_33.5 , data)
summary(model10)
##
## Call:
## lm(formula = production ~ . - AverageTEMP - AverageREL_HUMIDITY -
## AverageDSWRF - AverageCLOUD_LOW_LAYER - DSWRF_36.25_33.5 -
## CLOUD_LOW_LAYER_36.25_33 - CLOUD_LOW_LAYER_36.25_33.25 -
## CLOUD_LOW_LAYER_36.25_33.5 - REL_HUMIDITY_36.5_33 - REL_HUMIDITY_36.5_33.5 -
## TEMP_36.5_33.5 - CLOUD_LOW_LAYER_36.75_33 - DSWRF_36.75_33 -
## CLOUD_LOW_LAYER_36.75_33.25 - DSWRF_36.75_33.25 - CLOUD_LOW_LAYER_36.75_33.5,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.300 -0.711 -0.061 1.105 20.720
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.658e+02 1.550e+03 0.623 0.533315
## date -5.146e-02 8.305e-02 -0.620 0.535461
## hour 2.397e-04 1.062e-02 0.023 0.981996
## DSWRF_36.25_33 -1.402e-03 1.406e-03 -0.997 0.318964
## REL_HUMIDITY_36.25_33 8.654e-03 8.842e-03 0.979 0.327772
## TEMP_36.25_33 1.339e-01 6.979e-02 1.919 0.055055 .
## DSWRF_36.25_33.25 9.683e-04 1.455e-03 0.666 0.505744
## REL_HUMIDITY_36.25_33.25 -2.035e-02 1.382e-02 -1.473 0.140892
## TEMP_36.25_33.25 1.362e-01 1.026e-01 1.327 0.184640
## REL_HUMIDITY_36.25_33.5 1.158e-02 8.814e-03 1.313 0.189108
## TEMP_36.25_33.5 -1.183e-01 6.602e-02 -1.793 0.073042 .
## CLOUD_LOW_LAYER_36.5_33 -5.289e-03 2.541e-03 -2.082 0.037380 *
## DSWRF_36.5_33 6.965e-03 1.401e-03 4.972 6.73e-07 ***
## TEMP_36.5_33 -1.430e-01 4.650e-02 -3.076 0.002104 **
## CLOUD_LOW_LAYER_36.5_33.25 -4.107e-03 3.836e-03 -1.071 0.284395
## DSWRF_36.5_33.25 -5.504e-03 1.863e-03 -2.955 0.003131 **
## REL_HUMIDITY_36.5_33.25 -1.414e-02 9.243e-03 -1.530 0.126062
## TEMP_36.5_33.25 -8.931e-02 7.965e-02 -1.121 0.262204
## CLOUD_LOW_LAYER_36.5_33.5 -1.197e-02 3.367e-03 -3.554 0.000381 ***
## DSWRF_36.5_33.5 1.726e-03 1.659e-03 1.041 0.298107
## REL_HUMIDITY_36.75_33 2.003e-02 6.956e-03 2.879 0.003994 **
## TEMP_36.75_33 2.077e-01 5.789e-02 3.587 0.000335 ***
## REL_HUMIDITY_36.75_33.25 -3.002e-02 1.013e-02 -2.964 0.003045 **
## TEMP_36.75_33.25 -3.030e-01 7.859e-02 -3.855 0.000116 ***
## DSWRF_36.75_33.5 -3.586e-03 1.118e-03 -3.208 0.001339 **
## REL_HUMIDITY_36.75_33.5 2.709e-02 7.661e-03 3.536 0.000407 ***
## TEMP_36.75_33.5 1.545e-01 6.614e-02 2.335 0.019542 *
## Year2022 4.078e+00 1.412e+00 2.889 0.003877 **
## Month2 4.466e-01 2.232e-01 2.001 0.045369 *
## Month3 1.044e+00 2.896e-01 3.604 0.000315 ***
## Month4 1.524e+00 3.994e-01 3.815 0.000137 ***
## Month5 1.669e+00 5.158e-01 3.235 0.001220 **
## Month6 2.015e+00 6.475e-01 3.112 0.001862 **
## Month7 2.294e+00 7.703e-01 2.979 0.002901 **
## Month8 2.549e+00 8.707e-01 2.927 0.003425 **
## Month9 2.738e+00 9.683e-01 2.828 0.004699 **
## Month10 2.708e+00 1.072e+00 2.526 0.011557 *
## Month11 3.129e+00 1.188e+00 2.633 0.008468 **
## Month12 3.640e+00 1.306e+00 2.787 0.005331 **
## Hour_factor1 -3.137e-02 2.260e-01 -0.139 0.889641
## Hour_factor2 -7.446e-02 2.218e-01 -0.336 0.737050
## Hour_factor3 -1.014e-01 2.180e-01 -0.465 0.641873
## Hour_factor4 -1.312e-01 2.149e-01 -0.610 0.541694
## Hour_factor5 -1.472e-01 2.122e-01 -0.694 0.487868
## Hour_factor6 4.902e-01 2.101e-01 2.334 0.019634 *
## Hour_factor7 4.901e+00 2.148e-01 22.813 < 2e-16 ***
## Hour_factor8 7.931e+00 2.391e-01 33.172 < 2e-16 ***
## Hour_factor9 4.461e+00 2.604e-01 17.130 < 2e-16 ***
## Hour_factor10 1.996e+00 2.931e-01 6.810 1.03e-11 ***
## Hour_factor11 1.234e+00 3.050e-01 4.047 5.22e-05 ***
## Hour_factor12 1.263e+00 3.144e-01 4.017 5.94e-05 ***
## Hour_factor13 1.327e+00 3.204e-01 4.142 3.47e-05 ***
## Hour_factor14 8.576e-01 3.219e-01 2.665 0.007722 **
## Hour_factor15 4.251e-01 3.189e-01 1.333 0.182454
## Hour_factor16 -1.401e+00 2.910e-01 -4.814 1.50e-06 ***
## Hour_factor17 -4.715e+00 2.695e-01 -17.495 < 2e-16 ***
## Hour_factor18 -3.892e+00 2.535e-01 -15.352 < 2e-16 ***
## Hour_factor19 -9.123e-01 2.446e-01 -3.729 0.000193 ***
## Hour_factor20 4.320e-01 2.390e-01 1.808 0.070698 .
## Hour_factor21 4.170e-01 2.362e-01 1.766 0.077483 .
## Hour_factor22 6.336e-02 2.262e-01 0.280 0.779426
## Hour_factor23 NA NA NA NA
## max_in_month -6.034e-03 2.788e-02 -0.216 0.828674
## max_in_week 4.564e-02 2.541e-02 1.796 0.072479 .
## night1 NA NA NA NA
## Lag1 6.492e-01 6.993e-03 92.833 < 2e-16 ***
## Lag_week 1.375e-01 5.998e-03 22.920 < 2e-16 ***
## Lag_day 1.950e-01 6.271e-03 31.096 < 2e-16 ***
## Trend 1.633e-03 3.510e-03 0.465 0.641693
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.45 on 10661 degrees of freedom
## (168 observations deleted due to missingness)
## Multiple R-squared: 0.9423, Adjusted R-squared: 0.942
## F-statistic: 2639 on 66 and 10661 DF, p-value: < 2.2e-16
checkresiduals(model10)
##
## Breusch-Godfrey test for serial correlation of order up to 72
##
## data: Residuals
## LM test = 2521, df = 72, p-value < 2.2e-16
AIC(model10)
## [1] 57086.41
BIC(model10)
## [1] 57581.5
All the exogenous variables are added to the linear regression model, however some of the insignificant ones are removed to get a more contact model.
model11 <- auto.arima(data[,"production"], seasonal =TRUE, trace=T)
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,1,2) with drift : 65640.88
## ARIMA(0,1,0) with drift : 68977.85
## ARIMA(1,1,0) with drift : 65949.94
## ARIMA(0,1,1) with drift : 65852.52
## ARIMA(0,1,0) : 68975.84
## ARIMA(1,1,2) with drift : 65638.35
## ARIMA(0,1,2) with drift : 65639.46
## ARIMA(1,1,1) with drift : 65641.52
## ARIMA(1,1,3) with drift : 65640.25
## ARIMA(0,1,3) with drift : 65636.85
## ARIMA(0,1,4) with drift : 65638.17
## ARIMA(1,1,4) with drift : Inf
## ARIMA(0,1,3) : 65634.85
## ARIMA(0,1,2) : 65637.46
## ARIMA(1,1,3) : 65637.77
## ARIMA(0,1,4) : 65636.16
## ARIMA(1,1,2) : 65636.35
## ARIMA(1,1,4) : Inf
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(0,1,3) : 65638.68
##
## Best model: ARIMA(0,1,3)
summary(model11)
## Series: data[, "production"]
## ARIMA(0,1,3)
##
## Coefficients:
## ma1 ma2 ma3
## 0.5803 0.1522 0.0203
## s.e. 0.0096 0.0111 0.0094
##
## sigma^2 = 24.2: log likelihood = -32815.34
## AIC=65638.68 AICc=65638.68 BIC=65667.86
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -7.927164e-08 4.918302 2.431535 NaN Inf 0.8797759 -0.0001083999
checkresiduals(model11)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,3)
## Q* = 454.1, df = 7, p-value < 2.2e-16
##
## Model df: 3. Total lags used: 10
AIC(model11)
## [1] 65638.68
BIC(model11)
## [1] 65667.86
model12 <- auto.arima(data[,"production"], seasonal =FALSE, trace=T)
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,1,2) with drift : 65640.88
## ARIMA(0,1,0) with drift : 68977.85
## ARIMA(1,1,0) with drift : 65949.94
## ARIMA(0,1,1) with drift : 65852.52
## ARIMA(0,1,0) : 68975.84
## ARIMA(1,1,2) with drift : 65638.35
## ARIMA(0,1,2) with drift : 65639.46
## ARIMA(1,1,1) with drift : 65641.52
## ARIMA(1,1,3) with drift : 65640.25
## ARIMA(0,1,3) with drift : 65636.85
## ARIMA(0,1,4) with drift : 65638.17
## ARIMA(1,1,4) with drift : Inf
## ARIMA(0,1,3) : 65634.85
## ARIMA(0,1,2) : 65637.46
## ARIMA(1,1,3) : 65637.77
## ARIMA(0,1,4) : 65636.16
## ARIMA(1,1,2) : 65636.35
## ARIMA(1,1,4) : Inf
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(0,1,3) : 65638.68
##
## Best model: ARIMA(0,1,3)
summary(model12)
## Series: data[, "production"]
## ARIMA(0,1,3)
##
## Coefficients:
## ma1 ma2 ma3
## 0.5803 0.1522 0.0203
## s.e. 0.0096 0.0111 0.0094
##
## sigma^2 = 24.2: log likelihood = -32815.34
## AIC=65638.68 AICc=65638.68 BIC=65667.86
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -7.927164e-08 4.918302 2.431535 NaN Inf 0.8797759 -0.0001083999
checkresiduals(model12)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,3)
## Q* = 454.1, df = 7, p-value < 2.2e-16
##
## Model df: 3. Total lags used: 10
AIC(model12)
## [1] 65638.68
BIC(model12)
## [1] 65667.86
AvgDSWRF <- as.numeric(data[,AverageDSWRF])
model13 <- auto.arima(data[,"production"], xreg= AvgDSWRF, seasonal =FALSE, trace=T)
##
## Fitting models using approximations to speed things up...
##
## Regression with ARIMA(2,1,2) errors : Inf
## Regression with ARIMA(0,1,0) errors : 68398.07
## Regression with ARIMA(1,1,0) errors : 65934.24
## Regression with ARIMA(0,1,1) errors : 65679.35
## ARIMA(0,1,0) : 68396.07
## Regression with ARIMA(1,1,1) errors : 65564.14
## Regression with ARIMA(2,1,1) errors : 65544.59
## Regression with ARIMA(2,1,0) errors : 65550.69
## Regression with ARIMA(3,1,1) errors : 65547.36
## Regression with ARIMA(1,1,2) errors : 65550.36
## Regression with ARIMA(3,1,0) errors : 65546.36
## Regression with ARIMA(3,1,2) errors : Inf
## ARIMA(2,1,1) : 65542.59
## ARIMA(1,1,1) : 65562.11
## ARIMA(2,1,0) : 65548.63
## ARIMA(3,1,1) : 65545.36
## ARIMA(2,1,2) : Inf
## ARIMA(1,1,0) : 65932.23
## ARIMA(1,1,2) : 65548.35
## ARIMA(3,1,0) : 65544.31
## ARIMA(3,1,2) : Inf
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(2,1,1) : 65544.35
##
## Best model: Regression with ARIMA(2,1,1) errors
summary(model13)
## Series: data[, "production"]
## Regression with ARIMA(2,1,1) errors
##
## Coefficients:
## ar1 ar2 ma1 xreg
## 0.4152 -0.1289 0.1448 0.0049
## s.e. 0.0489 0.0258 0.0494 0.0005
##
## sigma^2 = 23.99: log likelihood = -32767.17
## AIC=65544.35 AICc=65544.35 BIC=65580.83
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -2.406619e-05 4.89661 2.500338 NaN Inf 0.9046698 -6.247474e-05
checkresiduals(model13)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,1,1) errors
## Q* = 402.21, df = 6, p-value < 2.2e-16
##
## Model df: 4. Total lags used: 10
AIC(model13)
## [1] 65544.35
BIC(model13)
## [1] 65580.83
AvgTEMP <- as.numeric(data[,AverageTEMP])
model14 <- auto.arima(data[,"production"], xreg= AvgTEMP, seasonal =FALSE, trace=T)
##
## Fitting models using approximations to speed things up...
##
## Regression with ARIMA(2,1,2) errors : 64134.79
## Regression with ARIMA(0,1,0) errors : 66078.88
## Regression with ARIMA(1,1,0) errors : 64565.76
## Regression with ARIMA(0,1,1) errors : 64202.59
## ARIMA(0,1,0) : 66076.88
## Regression with ARIMA(1,1,2) errors : 64184.01
## Regression with ARIMA(2,1,1) errors : 64132.86
## Regression with ARIMA(1,1,1) errors : 64200.95
## Regression with ARIMA(2,1,0) errors : 64131.04
## Regression with ARIMA(3,1,0) errors : 64133.85
## Regression with ARIMA(3,1,1) errors : 64135.83
## ARIMA(2,1,0) : 64129.06
## ARIMA(1,1,0) : 64563.76
## ARIMA(3,1,0) : 64131.85
## ARIMA(2,1,1) : 64130.88
## ARIMA(1,1,1) : 64198.94
## ARIMA(3,1,1) : 64133.83
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(2,1,0) : 64130.51
##
## Best model: Regression with ARIMA(2,1,0) errors
summary(model14)
## Series: data[, "production"]
## Regression with ARIMA(2,1,0) errors
##
## Coefficients:
## ar1 ar2 xreg
## 0.4367 -0.1985 2.4070
## s.e. 0.0096 0.0094 0.0544
##
## sigma^2 = 21.07: log likelihood = -32061.25
## AIC=64130.5 AICc=64130.51 BIC=64159.69
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.0009995426 4.589428 2.753393 NaN Inf 0.99623 0.0008446895
checkresiduals(model14)
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,1,0) errors
## Q* = 413.5, df = 7, p-value < 2.2e-16
##
## Model df: 3. Total lags used: 10
AIC(model14)
## [1] 64130.5
BIC(model14)
## [1] 64159.69
model15 <- arima(data[,"production"],c(2,0,0))
summary(model15)
##
## Call:
## arima(x = data[, "production"], order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## 1.4298 -0.5557 10.4434
## s.e. 0.0080 0.0080 0.3552
##
## sigma^2 estimated as 21.81: log likelihood = -32255.29, aic = 64518.57
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.0001431244 4.670327 3.01387 NaN Inf 1.090475 0.02674318
checkresiduals(model15)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,0) with non-zero mean
## Q* = 542.14, df = 7, p-value < 2.2e-16
##
## Model df: 3. Total lags used: 10
AIC(model15)
## [1] 64518.57
BIC(model15)
## [1] 64547.76
model16 <- arima(data[,"production"],c(3,0,0))
summary(model16)
##
## Call:
## arima(x = data[, "production"], order = c(3, 0, 0))
##
## Coefficients:
## ar1 ar2 ar3 intercept
## 1.4565 -0.6243 0.0480 10.4437
## s.e. 0.0096 0.0158 0.0096 0.3727
##
## sigma^2 estimated as 21.76: log likelihood = -32242.72, aic = 64495.44
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.0001934271 4.664941 2.983898 NaN Inf 1.079631 0.007148439
checkresiduals(model16)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,0,0) with non-zero mean
## Q* = 485.75, df = 6, p-value < 2.2e-16
##
## Model df: 4. Total lags used: 10
AIC(model16)
## [1] 64495.44
BIC(model16)
## [1] 64531.92
model17 <- arima(data[,"production"],c(4,0,1))
summary(model17)
##
## Call:
## arima(x = data[, "production"], order = c(4, 0, 1))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 intercept
## 2.1593 -1.7501 0.7496 -0.2175 -0.7465 10.4458
## s.e. 0.0130 0.0245 0.0216 0.0096 0.0099 0.1859
##
## sigma^2 estimated as 20.23: log likelihood = -31845.09, aic = 63704.19
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.000289889 4.497696 2.962509 NaN Inf 1.071892 -0.01150789
checkresiduals(model17)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,0,1) with non-zero mean
## Q* = 78.318, df = 4, p-value = 4.441e-16
##
## Model df: 6. Total lags used: 10
AIC(model17)
## [1] 63704.19
BIC(model17)
## [1] 63755.26
Both auto.arima is used to develop different models and self-developed models are formed from the information given by partial auto correlation and autocorrelation plots. Arima models explain a large portion of the data, however the best linear regression model seems to be better than the arima models considering their AIC and BIC measures.
Model Selection & Chosen Approach:
final_model=model10
Among different models developed, the Akaike and Bayesian information criteria give the smallest result with model 10, which is a linear regression model with autoregressive components, moving average components, dummy variables for yearly, monthly and hourly seasonality, trend component and variables given except the ones that are really insignificant. So, model10 is used as the final model.
Prediction & Results:
Data is manipulated before making predictions in order to use the real weather forecasts that is not developed with the model created but given.
long_weather1 =
long_weather %>%
arrange(long_weather) %>%
mutate(value= shift(value,-2592))
long_weather1=long_weather1[-c((.N-6047):.N),]
wide_weather1= dcast(long_weather1, date+hour~lat+lon+variable)
data1 <- data.table(merge(wide_weather1,production))
data1[,AverageTEMP:=rowMeans(data1[,c("36.75_33.5_TEMP","36.75_33.25_TEMP","36.75_33_TEMP","36.5_33.5_TEMP","36.5_33.25_TEMP","36.5_33_TEMP","36.25_33.5_TEMP","36.25_33.25_TEMP","36.25_33_TEMP")])]
data1[,AverageREL_HUMIDITY:=rowMeans(data1[,c("36.75_33.5_REL_HUMIDITY","36.75_33.25_REL_HUMIDITY","36.75_33_REL_HUMIDITY","36.5_33.5_REL_HUMIDITY","36.5_33.25_REL_HUMIDITY","36.5_33_REL_HUMIDITY","36.25_33.5_REL_HUMIDITY","36.25_33.25_REL_HUMIDITY","36.25_33_REL_HUMIDITY")])]
data1[,AverageDSWRF:=rowMeans(data1[,c("36.75_33.5_DSWRF","36.75_33.25_DSWRF","36.75_33_DSWRF","36.5_33.5_DSWRF","36.5_33.25_DSWRF","36.5_33_DSWRF","36.25_33.5_DSWRF","36.25_33.25_DSWRF","36.25_33_DSWRF")])]
data1[,AverageCLOUD_LOW_LAYER:=rowMeans(data1[,c("36.75_33.5_CLOUD_LOW_LAYER","36.75_33.25_CLOUD_LOW_LAYER","36.75_33_CLOUD_LOW_LAYER","36.5_33.5_CLOUD_LOW_LAYER","36.5_33.25_CLOUD_LOW_LAYER","36.5_33_CLOUD_LOW_LAYER","36.25_33.5_CLOUD_LOW_LAYER","36.25_33.25_CLOUD_LOW_LAYER","36.25_33_CLOUD_LOW_LAYER")])]
data1 <- data1[order(hour,decreasing = F)]
data1 <- data1[order(date,decreasing = F)]
data1[, Year:=as.factor(year(date))]
data1[,Month := as.factor(month(date))]
data1[,Hour_factor := as.factor(hour)]
data1[,max_in_month:=runmax(x=data$production, k=720, align = "left")]
data1[,max_in_week:=runmax(x=data$production, k=168, align ="left")]
data1[hour<=5|hour>=21,night:=1]
data1[hour<21&hour>5,night:=0]
data1$night <- as.factor(data1$night)
data1[,Lag1:=c(NA, data1$production[1:(.N-1)])]
data1[,Lag_week:=c(rep(NA,168), data1$production[1:(.N-24*7)])]
data1[,Lag_day:=c(rep(NA,24), data1$production[1:(.N-24)])]
data1[, Trend:=(1:.N)]
colnames(data1) <- c("date","hour","CLOUD_LOW_LAYER_36.25_33","DSWRF_36.25_33","REL_HUMIDITY_36.25_33","TEMP_36.25_33","CLOUD_LOW_LAYER_36.25_33.25","DSWRF_36.25_33.25","REL_HUMIDITY_36.25_33.25","TEMP_36.25_33.25","CLOUD_LOW_LAYER_36.25_33.5","DSWRF_36.25_33.5","REL_HUMIDITY_36.25_33.5","TEMP_36.25_33.5","CLOUD_LOW_LAYER_36.5_33","DSWRF_36.5_33","REL_HUMIDITY_36.5_33","TEMP_36.5_33","CLOUD_LOW_LAYER_36.5_33.25","DSWRF_36.5_33.25","REL_HUMIDITY_36.5_33.25","TEMP_36.5_33.25","CLOUD_LOW_LAYER_36.5_33.5","DSWRF_36.5_33.5","REL_HUMIDITY_36.5_33.5","TEMP_36.5_33.5","CLOUD_LOW_LAYER_36.75_33","DSWRF_36.75_33","REL_HUMIDITY_36.75_33","TEMP_36.75_33","CLOUD_LOW_LAYER_36.75_33.25","DSWRF_36.75_33.25","REL_HUMIDITY_36.75_33.25","TEMP_36.75_33.25","CLOUD_LOW_LAYER_36.75_33.5","DSWRF_36.75_33.5","REL_HUMIDITY_36.75_33.5","TEMP_36.75_33.5","production","AverageTEMP","AverageREL_HUMIDITY","AverageDSWRF","AverageCLOUD_LOW_LAYER","Year","Month","Hour_factor","max_in_month","max_in_week","night","Lag1","Lag_week","Lag_day","Trend")
tmp= data1[(.N-71):(.N)]
predictions = rep(0,72)
for(i in 1:72) {
predictions[i] = predict(final_model,newdata = tmp[i,])
tmp[i+1,"Lag1"] = predictions[i]
if(predictions[i]<0){predictions[i]=0}
}
predictions[49:72]
## [1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [7] 1.2256838 9.9448617 21.7291328 28.8898355 29.9850460 30.6944920
## [13] 30.6114639 30.6471145 27.9944763 24.3377290 18.0918973 8.5723744
## [19] 0.7482614 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
Conclusion:
For understanding the behavior of the data, autoregressive, moving average and arima methods are used. Trend and seasonality are added for analyzing time dependency and autocorrelation of the data. After that process, temperature, relative humidity, downward shortwave radiation flux and cloud cover data are added and subtracted to see the correlation and relations between them and the production rate. Also, for the arima part of model trials, different p (for autoregression part), q (for moving average part) and d (for the number of differencing) values are tried in addition to the auto.arima trials.
Model10 is a linear regression model with trend and seasonality components that also contains autoregressive components with lags 1,24,168 to account for hourly, dayly and weekly seasonality and trend. This model contains dummy variables for hour, day, month and year and moving average component that gives maximum production level within the week and within the month. Model10 is used as the final model because the AIC and BIC performance measures as well as the adjusted R_squared value gives the best result among the models developed.
The residuals of the final model is not perfect white noise as it should have been, however there is no visible seasonality left in the autocorrelation function and even though there is autocorrelation among some variables, they are small. The residuals seem to be normally distributed and around zero mean with constant variation. There is no visible trend and/or seasonality left in the residuals, which show that the predictions that this model come up with can be used. The predictions are made for two days ahead using the variables related to weather forecast information that belong to the day that is being predicted and using model10 created with the data from two days before.